Biostatistics For Dummies, 2nd Edition (Monika Wahi, John Pezzullo)

CHAPTER 12 Comparing Proportions and Analyzing Cross-Tabulations 161

» ^{Most statistical software is also set up so that you can do these tests using}

summarized data (rather than individual-level data), so long as you set an

option in your programming when running the tests. In contrast, online

calculators that execute these tests expect you to have already cross-tabulated

the data. These calculators usually present a screen showing an empty table,

and you enter the counts into the table’s cells to run the calculation.

Examining Two Variables with the

Pearson Chi-Square Test

The most commonly used statistical test of association between two categorical

variables is called the chi-square test of association developed by Karl Pearson

around the year 1900. It’s called the chi-square test because it involves calculating

a number called a test statistic that fluctuates in accordance with the chi-square

distribution. Many other statistical tests also use the chi-square distribution, but

the test of association is by far the most popular. In this book, whenever we refer

to a chi-square test without specifying which one, we are referring to the Pearson

chi-square test of association between two categorical variables. (Please note that

some books use the notation X² or x² instead of saying the term chi-square.)

Understanding how the

chi-square test works

You don’t have to understand the equations behind the chi-square test if you have

a computer to do them, which is optimal, though it is possible to calculate the test

manually. This means you technically don’t have to read this section. But we

encourage you to do so anyway, because we think you’ll have a better appreciation

for the strengths and limitations of the test if you know its mathematical under-

pinnings. Here, we walk you through conducting a chi-square test manually

(which is possible to do in Microsoft Excel).

Calculating observed and expected counts

All statistical significance tests start with a null hypothesis (H0) that asserts that no

real effect is present in the population, and any effect you think you see in your

sample is due only to random fluctuations. (See Chapter 3 for more information.)

The H0 for the chi-square test asserts that there’s no association between the

levels of the row variable and the levels of the column variable, so you should

expect the relative spread of cell counts across the columns to be the same for

each row.